353 research outputs found
Analyzing the Social Structure and Dynamics of E-mail and Spam in Massive Backbone Internet Traffic
E-mail is probably the most popular application on the Internet, with
everyday business and personal communications dependent on it. Spam or
unsolicited e-mail has been estimated to cost businesses significant amounts of
money. However, our understanding of the network-level behavior of legitimate
e-mail traffic and how it differs from spam traffic is limited. In this study,
we have passively captured SMTP packets from a 10 Gbit/s Internet backbone link
to construct a social network of e-mail users based on their exchanged e-mails.
The focus of this paper is on the graph metrics indicating various structural
properties of e-mail networks and how they evolve over time. This study also
looks into the differences in the structural and temporal characteristics of
spam and non-spam networks. Our analysis on the collected data allows us to
show several differences between the behavior of spam and legitimate e-mail
traffic, which can help us to understand the behavior of spammers and give us
the knowledge to statistically model spam traffic on the network-level in order
to complement current spam detection techniques.Comment: 15 pages, 20 figures, technical repor
Efficient Lock-free Binary Search Trees
In this paper we present a novel algorithm for concurrent lock-free internal
binary search trees (BST) and implement a Set abstract data type (ADT) based on
that. We show that in the presented lock-free BST algorithm the amortized step
complexity of each set operation - {\sc Add}, {\sc Remove} and {\sc Contains} -
is , where, is the height of BST with number of nodes
and is the contention during the execution. Our algorithm adapts to
contention measures according to read-write load. If the situation is
read-heavy, the operations avoid helping pending concurrent {\sc Remove}
operations during traversal, and, adapt to interval contention. However, for
write-heavy situations we let an operation help pending {\sc Remove}, even
though it is not obstructed, and so adapt to tighter point contention. It uses
single-word compare-and-swap (\texttt{CAS}) operations. We show that our
algorithm has improved disjoint-access-parallelism compared to similar existing
algorithms. We prove that the presented algorithm is linearizable. To the best
of our knowledge this is the first algorithm for any concurrent tree data
structure in which the modify operations are performed with an additive term of
contention measure.Comment: 15 pages, 3 figures, submitted to POD
Self-stabilizing TDMA Algorithms for Wireless Ad-hoc Networks without External Reference
Time division multiple access (TDMA) is a method for sharing communication
media. In wireless communications, TDMA algorithms often divide the radio time
into timeslots of uniform size, , and then combine them into frames of
uniform size, . We consider TDMA algorithms that allocate at least one
timeslot in every frame to every node. Given a maximal node degree, ,
and no access to external references for collision detection, time or position,
we consider the problem of collision-free self-stabilizing TDMA algorithms that
use constant frame size.
We demonstrate that this problem has no solution when the frame size is , where is the chromatic number for
distance- vertex coloring. As a complement to this lower bound, we focus on
proving the existence of collision-free self-stabilizing TDMA algorithms that
use constant frame size of . We consider basic settings (no hardware
support for collision detection and no prior clock synchronization), and the
collision of concurrent transmissions from transmitters that are at most two
hops apart. In the context of self-stabilizing systems that have no external
reference, we are the first to study this problem (to the best of our
knowledge), and use simulations to show convergence even with computation time
uncertainties
Analyzing the Performance of Lock-Free Data Structures: A Conflict-based Model
This paper considers the modeling and the analysis of the performance of
lock-free concurrent data structures. Lock-free designs employ an optimistic
conflict control mechanism, allowing several processes to access the shared
data object at the same time. They guarantee that at least one concurrent
operation finishes in a finite number of its own steps regardless of the state
of the operations. Our analysis considers such lock-free data structures that
can be represented as linear combinations of fixed size retry loops. Our main
contribution is a new way of modeling and analyzing a general class of
lock-free algorithms, achieving predictions of throughput that are close to
what we observe in practice. We emphasize two kinds of conflicts that shape the
performance: (i) hardware conflicts, due to concurrent calls to atomic
primitives; (ii) logical conflicts, caused by simultaneous operations on the
shared data structure. We show how to deal with these hardware and logical
conflicts separately, and how to combine them, so as to calculate the
throughput of lock-free algorithms. We propose also a common framework that
enables a fair comparison between lock-free implementations by covering the
whole contention domain, together with a better understanding of the
performance impacting factors. This part of our analysis comes with a method
for calculating a good back-off strategy to finely tune the performance of a
lock-free algorithm. Our experimental results, based on a set of widely used
concurrent data structures and on abstract lock-free designs, show that our
analysis follows closely the actual code behavior.Comment: Short version to appear in DISC'1
Configurable Strategies for Work-stealing
Work-stealing systems are typically oblivious to the nature of the tasks they
are scheduling. For instance, they do not know or take into account how long a
task will take to execute or how many subtasks it will spawn. Moreover, the
actual task execution order is typically determined by the underlying task
storage data structure, and cannot be changed. There are thus possibilities for
optimizing task parallel executions by providing information on specific tasks
and their preferred execution order to the scheduling system.
We introduce scheduling strategies to enable applications to dynamically
provide hints to the task-scheduling system on the nature of specific tasks.
Scheduling strategies can be used to independently control both local task
execution order as well as steal order. In contrast to conventional scheduling
policies that are normally global in scope, strategies allow the scheduler to
apply optimizations on individual tasks. This flexibility greatly improves
composability as it allows the scheduler to apply different, specific
scheduling choices for different parts of applications simultaneously. We
present a number of benchmarks that highlight diverse, beneficial effects that
can be achieved with scheduling strategies. Some benchmarks (branch-and-bound,
single-source shortest path) show that prioritization of tasks can reduce the
total amount of work compared to standard work-stealing execution order. For
other benchmarks (triangle strip generation) qualitatively better results can
be achieved in shorter time. Other optimizations, such as dynamic merging of
tasks or stealing of half the work, instead of half the tasks, are also shown
to improve performance. Composability is demonstrated by examples that combine
different strategies, both within the same kernel (prefix sum) as well as when
scheduling multiple kernels (prefix sum and unbalanced tree search)
Shared-object System Equilibria: Delay and Throughput Analysis
We consider shared-object systems that require their threads to fulfill the
system jobs by first acquiring sequentially the objects needed for the jobs and
then holding on to them until the job completion. Such systems are in the core
of a variety of shared-resource allocation and synchronization systems. This
work opens a new perspective to study the expected job delay and throughput
analytically, given the possible set of jobs that may join the system
dynamically.
We identify the system dependencies that cause contention among the threads
as they try to acquire the job objects. We use these observations to define the
shared-object system equilibria. We note that the system is in equilibrium
whenever the rate in which jobs arrive at the system matches the job completion
rate. These equilibria consider not only the job delay but also the job
throughput, as well as the time in which each thread blocks other threads in
order to complete its job. We then further study in detail the thread work
cycles and, by using a graph representation of the problem, we are able to
propose procedures for finding and estimating equilibria, i.e., discovering the
job delay and throughput, as well as the blocking time.
To the best of our knowledge, this is a new perspective, that can provide
better analytical tools for the problem, in order to estimate performance
measures similar to ones that can be acquired through experimentation on
working systems and simulations, e.g., as job delay and throughput in
(distributed) shared-object systems
The Lock-free -LSM Relaxed Priority Queue
Priority queues are data structures which store keys in an ordered fashion to
allow efficient access to the minimal (maximal) key. Priority queues are
essential for many applications, e.g., Dijkstra's single-source shortest path
algorithm, branch-and-bound algorithms, and prioritized schedulers.
Efficient multiprocessor computing requires implementations of basic data
structures that can be used concurrently and scale to large numbers of threads
and cores. Lock-free data structures promise superior scalability by avoiding
blocking synchronization primitives, but the \emph{delete-min} operation is an
inherent scalability bottleneck in concurrent priority queues. Recent work has
focused on alleviating this obstacle either by batching operations, or by
relaxing the requirements to the \emph{delete-min} operation.
We present a new, lock-free priority queue that relaxes the \emph{delete-min}
operation so that it is allowed to delete \emph{any} of the smallest
keys, where is a runtime configurable parameter. Additionally, the
behavior is identical to a non-relaxed priority queue for items added and
removed by the same thread. The priority queue is built from a logarithmic
number of sorted arrays in a way similar to log-structured merge-trees. We
experimentally compare our priority queue to recent state-of-the-art lock-free
priority queues, both with relaxed and non-relaxed semantics, showing high
performance and good scalability of our approach.Comment: Short version as ACM PPoPP'15 poste
MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is very useful in optimization problems
with high-dimensional non-convex target functions, and hence constitutes an
important component of several Machine Learning and Data Analytics methods.
Recently there have been significant works on understanding the parallelism
inherent to SGD, and its convergence properties. Asynchronous, parallel SGD
(AsyncPSGD) has received particular attention, due to observed performance
benefits. On the other hand, asynchrony implies inherent challenges in
understanding the execution of the algorithm and its convergence, stemming from
the fact that the contribution of a thread might be based on an old (stale)
view of the state. In this work we aim to deepen the understanding of AsyncPSGD
in order to increase the statistical efficiency in the presence of stale
gradients. We propose new models for capturing the nature of the staleness
distribution in a practical setting. Using the proposed models, we derive a
staleness-adaptive SGD framework, MindTheStep-AsyncPSGD, for adapting the step
size in an online-fashion, which provably reduces the negative impact of
asynchrony. Moreover, we provide general convergence time bounds for a wide
class of staleness-adaptive step size strategies for convex target functions.
We also provide a detailed empirical study, showing how our approach implies
faster convergence for deep learning applications.Comment: 12 pages, 3 figures, accepted in IEEE BigData 201
Monotonically relaxing concurrent data-structure semantics for performance: An efficient 2D design framework
There has been a significant amount of work in the literature proposing
semantic relaxation of concurrent data structures for improving scalability and
performance. By relaxing the semantics of a data structure, a bigger design
space, that allows weaker synchronization and more useful parallelism, is
unveiled. Investigating new data structure designs, capable of trading
semantics for achieving better performance in a monotonic way, is a major
challenge in the area. We algorithmically address this challenge in this paper.
We present an efficient, lock-free, concurrent data structure design framework
for out-of-order semantic relaxation. Our framework introduces a new two
dimensional algorithmic design, that uses multiple instances of a given data
structure. The first dimension of our design is the number of data structure
instances operations are spread to, in order to benefit from parallelism
through disjoint memory access. The second dimension is the number of
consecutive operations that try to use the same data structure instance in
order to benefit from data locality. Our design can flexibly explore this
two-dimensional space to achieve the property of monotonically relaxing
concurrent data structure semantics for achieving better throughput performance
within a tight deterministic relaxation bound, as we prove in the paper. We
show how our framework can instantiate lock-free out-of-order queues, stacks,
counters and dequeues. We provide implementations of these relaxed data
structures and evaluate their performance and behaviour on two parallel
architectures. Experimental evaluation shows that our two-dimensional data
structures significantly outperform the respected previous proposed ones with
respect to scalability and throughput performance. Moreover, their throughput
increases monotonically as relaxation increases
- …